A METHOD FOR PRODUCING. IN YEAST, A HYDROXYLATED 
TRIPLE HELICAL PROTEIN, AND YEAST HOST CELLS USEFUL 

IN SAID METHOD 

This is a Continuation-in-part Application of Application No. 09/297,269, filed 
28 April, 1999, which in turn is a National stage filing under 35 U.S.C. § 371 of 
PCT/AU97/00721, filed October 29, 1997. 

Field of the Invention: 

This invention relates to the production of hydroxylated triple helical proteins 
such as natural and synthetic collagens, natural and synthetic collagen fragments, and 
natural and synthetic collagen-like proteins, by recombinant DNA technology. In 
particular, the invention relates to a method for producing hydroxylated triple helical 
proteins in yeast host cells by introducing to a suitable yeast host cell, DNA sequences 
encoding the triple helical protein as well as prolyl 4-hydroxylase (P4H), in a manner 
wherein the introduced DNA sequences are replicated, stably retained and segregated 
by the yeast host cells. 

Background of the Invention: 

The collagen family of proteins represents the most abundant protein in 
mammals, forming the major fibrous component of, for example, skin, bone, tendon, 
cartilage and blood vessels. Each collagen protein consists of three polypeptide chains 
(alpha chains) characterised by a (GlyXY) n repeating sequence, which are folded into a 
triple helical protein conformation. Type I collagen (typically found in skin, tendon, 
bone and cornea) consists of two types of polypeptide chain termed a 1(1) and a2(I) 
[i.e. al(I) 2 a2(I)], while other collagen types such as Type II [al(II) 3 ] and Type III 
[al(III) 3 ] have three identical polypeptide chains. These collagen proteins 
spontaneously aggregate to form fibrils which are incorporated into the extracellular 
matrix where, in mature tissue, they have a structural role and, in developing tissue, 
they have a directive role. The collagen fibrils, after cross-linking, are highly insoluble 
and have great tensile strength. 

The ability of collagen to form insoluble fibrils makes them attractive for 
numerous medical applications including bioimplant production, soft tissue 
augmentation and wound/burn dressings. To date, most collagens approved for these 
applications have been sourced from animal sources, primarily bovine. While such 
animal-sourced collagens have been successful, there is some concern that their use 
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risks serious immunogenicity problems and transmission of infective diseases and 
spongiform encephalopathies (e.g. bovine spongiform encephalopathy (BSE)). 
Accordingly, there is significant interest in the development of methods of production 
of collagens or collagen fragments by recombinant DNA technology. Further, the use 
5 of recombinant DNA technology is desirable in that it allows for the potential 
production of synthetic collagens and collagen fragments which may include, for 
example, exogenous biologically active domains (i.e. to provide additional protein 
function) and other useful characteristics (e.g. improved biocompatability and 
stability). 

io The in vivo biosynthesis of collagen proteins is a complex process involving 

many post translational events. A key event is the hydroxylation by the enzyme prolyl 
y 4-hydroxylase (P4H) of prolyl residues in the Y-position of the repeating (GlyXY) n 

sequences to 4-hydroxyproline. This hydroxylation has been found to be beneficial for 
nucleation of folding of triple helical proteins. For collagens, it is essential for stability 
15 at body temperature. Accordingly, the development of a commercially viable method 
for the production of recombinant collagen requires co-expression of P4H with the 
alpha chains. For mammalian host cells, co-expression of P4H will occur 
autonomously since these cells should naturally express P4H. However, for yeast host 
cells, which for reasons of cost, ease and efficiency are more attractive for expression 
20 of recombinant eukaryotic proteins, transformation with DNA sequences encoding P4H 
will also be required. Since P4H consists of a and p subunits of about 60 kDa and 60 
kDa, yeast host cells for expression of recombinant collagen will require co- 
transformation with at least three exogenous DNA sequences (i.e., encoding an alpha 
chain, P4H a subunit and P4H P subunit) and stability problems would therefore be 
25 expected if cloned on three separate vectors or, alternatively, all on episomal type 

vector. Indeed, even under continuous selection pressure, many episomal type vectors 
suffer stability problems if they are large or are present at relatively low copy number. 
An object of the present invention is therefore to provide a method for expressing 
recombinant collagen and other triple helical proteins from yeast host cells wherein the 
30 introduced DNA sequences are stably retained and segregated independent of 
continuous selection pressure. 

Summary of the Invention: 

Thus, in a first aspect, the present invention provides a method of producing, in 
35 yeast, a hydroxylated triple helical protein, said method comprising the steps of: 
(A) introducing into a suitable yeast host cell: 



(i) a first DNA molecule comprising a DNA sequence encoding prolyl 4- 
hydroxylase I-subunit (P4HI) operably linked to a promoter functional in said yeast 
host cell, 

(ii) a second DNA molecule comprising a DNA sequence encoding prolyl 
4-hydroxylase d-subunit (P4H&) operably linked to a promoter functional in said yeast 
host cell, and 

(iii) a third DNA molecule comprising a DNA sequence encoding a 
polypeptide or peptide operably linked to a promoter functional in said yeast host cell, 
wherein said polypeptide or peptide is one which, when hydroxylated, forms said 
hydroxylated triple helical protein, and wherein said polypeptide or peptide is a 
synthetic polypeptide or peptide represented by the following formula: 

(A),-(B) m .[Z]-(C) 0 -(D) p , 

wherein; 

Z is a domain comprising two or more repeat units of the formula: 

[(E) q - (GlyXY)* -(F^ 

wherein; 

E and F represent sequences of one or more amino acids, which 
sequences may vary from repeat unit to repeat unit, and for each repeat unit q 
and r are each independently selected from 0 and 1, and 

i is > 1 such that domain Z comprises 2 to 1 500 GlyXY triplets, 

Gly represents glycine, and 

X and Y, which may be the same or different, represent an 
amino acid, and wherein the identity of each amino acid represented by X and Y 
may vary from GlyXY triplet to GlyXY triplet, but wherein at least one Y of the 
(GlyXY)i sequence must be proline, 

A and D, which may be the same or different, each represent a polypeptide or 
peptide domain which optionally comprises a triple helical forming repeating sequence 
(GlyXY) n , and 1 and p are each independently selected from 0 and 1, 

B and C, which may be the same or different, each represent a polypeptide or 
peptide domain which is heterologous to collagen proteins and which does not 
comprise a triple helical forming repeating sequence (GlyXY) n , and m and o are each 
independently selected from 0 and 1; and 



4 



(B) culturing the resulting yeast host cell of step (A) under conditions suitable 
to express said P4HI and P4HS and said synthetic polypeptide or peptide, to produce 
said hydroxylated triple helical protein; 

wherein during culturing in step (B), each of said first DNA molecule, said 
5 second DNA molecule and said third DNA molecule are replicated, stably retained and 
segregated by the yeast host cell. 

In a second aspect, the present invention provides a yeast host cell capable of 
producing a hydroxylated triple helical protein, said yeast host cell including: 

(i) a first DNA sequence encoding prolyl 4-hydroxylase I-subunit (P4HI) 
O 10 operably linked to a promoter functional in said yeast host cell, 

(ii) a second DNA sequence encoding prolyl 4-hydroxylase 9-subunit 
(P4H&) operably linked to a promoter functional in said yeast host cell, and 

(iii) a third DNA sequence encoding a polypeptide or peptide operably 
linked to a promoter functional in said yeast host cell, wherein said polypeptide or 

!\ 15 peptide is one which, when hydroxylated, forms said hydroxylated triple helical 

ry protein, and wherein said polypeptide or peptide is a synthetic polypeptide or peptide 

[U represented by the following formula: 
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(A),-(B) m -[Z]-(C) 0 -(D) p , 

wherein; 

Z is a domain comprising two or more repeat units of the formula: 

[(E) q - (GlyXY); -(F) r l 



wherein; 

E and F represent sequences of one or more amino acids, which 
sequences may vary from repeat unit to repeat unit, and for each repeat unit q 
and r are each independently selected from 0 and 1, and 
30 i is > 1 such that domain Z comprises 2 to 1 500 GlyXY triplets, 

Gly represents glycine, and 

X and Y, which may be the same or different, represent an 
amino acid, and wherein the identity of each amino acid represented by X and Y 
may vary from GlyXY triplet to GlyXY triplet, but wherein at least one Y of the 
35 (GlyXY)i sequence must be proline, 



A and D, which may be the same or different, each represent a polypeptide or 
peptide domain which optionally comprises a triple helical forming repeating sequence 
(GlyXY) n , and 1 and p are each independently selected from 0 and 1, 

B and C, which may be the same or different, each represent a polypeptide or 
peptide domain which is heterologous to collagen proteins and which does not 
comprise a triple helical forming repeating sequence (GlyXY) n , and m and o are each 
independently selected from 0 and 1; and 

wherein each of said first DNA sequence, said second DNA sequence and said 
third DNA sequence are replicated, stably retained and segregated by the yeast host 
cell. 

In a third aspect, the present invention provides an hydroxylated triple helical 
protein comprising a polypeptide or peptide which is a synthetic polypeptide or peptide 
represented by the following formula: 

(A) } -(B) m -[Z]-(C) 0 -(P) p , 

wherein; 

Z is a domain comprising two or more repeat units of the formula: 

[(E) q -(GlyXY)i.(F) r ], 

wherein; 

E and F represent sequences of one or more amino acids, which 
sequences may vary from repeat unit to repeat unit, and for each repeat unit q 
and r are each independently selected from 0 and 1, and 

i is > 1 such that domain Z comprises 2 to 1 500 GlyXY triplets, 

Gly represents glycine, and 

X and Y, which may be the same or different, represent an 
amino acid, and wherein the identity of each amino acid represented by X and Y 
may vary from GlyXY triplet to GlyXY triplet, but wherein at least one Y of the 
(GlyXY)j sequence must be proline, 

A and D, which may be the same or different, each represent a polypeptide or 
peptide domain which optionally comprises a triple helical forming repeating sequence 
(GlyXY) n , and 1 and p are each independently selected from 0 and 1, 

B and C, which may be the same or different, each represent a polypeptide or 
peptide domain which is heterologous to collagen proteins and which does not 
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comprise a triple helical forming repeating sequence (GlyXY) n , and m and o are each 
independently selected from 0 and 1 . 

In a fourth aspect, the present invention provides a biomaterial or therapeutic 
product comprising an hydroxylated triple helical protein according to the third aspect. 

5 

Detailed disclosure of the Invention: 

The method according to the invention requires that the first and second 
nucleotide sequences encoding the P4H a and P subunits and the third nucleotide 
sequence (i.e. the "product-encoding nucleotide sequence") be introduced to a suitable 

10 yeast host cell in a manner such that they are borne on one or more DNA molecules 
that are replicated, stably retained and segregated by the yeast host cell during 
culturing. In this way, all daughter cells will include the first, second and product- 
encoding nucleotide sequences and thus stable and efficient expression of a 
hydroxylated triple helical protein product can be ensured throughout the culturing step 

15 and without the use of continuous selection pressure. 

The method according to the invention can be achieved by; (i) integrating (e.g. 
by homologous recombination) one or more of the exogenous nucleotide sequences 
(i.e. one or more of the first, second and product-encoding nucleotide sequences) into 
one or more chromosome(s) of the yeast host cell, or (ii) including one or more of the 

20 exogenous nucleotide sequences within one or more vector(s) including a centromere 
(CEN) sequence(s). Alternatively, a combination of these techniques may be used or 
one or both of these techniques may be used in combination with the use of one or two 
high copy number plasmid(s) which include the remainder of the exogenous nucleotide 
sequences. For example, the first and second nucleotide sequences encoding the P4H a 

25 and p subunits may be integrated into a host chromosome while the product-encoding 
sequences may be included on vector(s) including a CEN sequence or on a high copy 
number vector(s). 

Preferably, the method of the invention is achieved by including the exogenous 
nucleotide sequences within a vector(s) including a CEN sequence. Particularly 
30 preferred are the CEN sequence-including YAC (yeast artificial chromosome) vectors 
(Cohen et aL y 1993) and pYEUra3 vectors (Clontech, Cat. No 6195-1). Other vectors 
including a CEN sequence may be generated by cloning a CEN sequence into any 
suitable expression vector. 

Where one or more of the exogenous nucleotide sequences are included in a 
35 high copy number vector(s), it is preferred that the high copy number vector(s) is/are 
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selected from those that may be present at 20 to 500 (preferably, 400 to 500) copies per 
host cell. Particularly preferred high copy number vectors are the YEp vectors. 

The method according to the invention enables the production of hydroxylated 
triple helical proteins. The term "triple helical protein 11 is to be understood as referring 
5 to a homo or heterotrimeric protein consisting of a polypeptide(s) or peptide(s) which 
include at least a region having the general peptide formula: (GlyXY) n , in which Gly is 
glycine, X and Y represent the same or different amino acids (the identities of which 
may vary from GlyXY triplet to GlyXY triplet) but wherein X and Y are frequently 
■ -i! proline which in the case of Y becomes, after modification, hydroxyproline (Hyp), and 

M io n is in the range of 2 to 1500 (preferably 10 to 350), which region forms, together with 
" the same or similar regions of two other polypeptides or peptides, a triple helical 

ru protein conformation. The triple helical proteins may include non-collagenous, non- 

zi triple helical domains at the amino and/or carboxy terminal ends or elsewhere, 

yj Product-encoding nucleotide sequences used in the present invention encode a 

H 5 15 polypeptide or peptide of the general formula: (A)i - (B) m - [Z] - (C) 0 - (D) p , wherein; 
1^ Z is a domain comprising two or more repeat units of the formula: [(E) q - (GlyXY)j - 

0J (F) r ], wherein E and F represent sequences of one or more amino acids (which 

sequences may vary from repeat unit to repeat unit), and for each repeat unit q and r are 
each independently selected from 0 and 1 and i is > 1 such that domain Z comprises 2 
20 to 1500 GlyXY triplets (preferably, 10 to 300 GlyXY triplets), Gly represents glycine, 
and X and Y (which may be the same or different), represent an amino acid, and 
wherein the identity of each amino acid represented by X and Y may vary from GlyXY 
triplet to GlyXY triplet, but wherein at least one Y of the (GlyXY)i sequence must be 
proline, A and D (which may be the same or different), each represent a polypeptide or 
25 peptide domain which optionally comprises a triple helical forming repeating sequence 
(GlyXY) n , and 1 and p are each independently selected from 0 and 1, B and C (which 
may be the same or different), each represent a polypeptide or peptide domain which is 
heterologous to collagen proteins and which does not comprise a triple helical forming 
repeating sequence (GlyXY) n , and m and o are each independently selected from 0 and 
30 1. 

Preferably, in domain Z, the component (GlyXY)j has an amino acid length 
which is at least three times greater than the combined amino acid length of the 
components E and F. Of course, in accordance with the formula of Z given above, one 
or both of E and F may be absent. 
35 The portion of the product-encoding nucleotide sequence(s) encoding the repeat 

unit of domain Z may be generated through the use of polymerase chain reaction (PCR) 
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techniques or chemical DNA synthesis. Stepwise addition of such nucleotide 
sequences, so as to generate the repeating nucleotide sequence for domain Z may be 
achieved by utilising different restriction sites at the termini of primers used to produce 
the PCR fragments or through variations in chemical DNA synthesis. The selected 
5 restriction sequences are such that the desired linear order of the repeated nucleotide 
sequences is achieved in a manner which maintains the overall phase or open reading 
frame of the product-encoding nucleotide sequence and which ensures that every third 
amino acid of the encoded Z domain is Gly. Example 7 hereinafter provides an 
example of this strategy. That is, Example 7 describes a strategy for producing a 

10 product-encoding nucleotide sequence encoding a domain Z with three repeat 

sequences derived from an integrin binding site of Type III collagen, wherein the first 
step was to clone an EcoRI-[(GlyXY) n ]-Bspl20I PCR fragment, the second step added 
a Bspl20I-[(GlyXY) n ]-BssHE fragment, and the third step added a BssHII-[(GlyXY) n ]- 
SacII, wherein the amino acid sequence of the polypeptide encoded by each fragment is 

15 the same. This strategy can be readily extended to add a nucleotide sequence encoding 
fourth, fifth etc. repeat units. 

Alternatively, the portion of the product-encoding nucleotide sequence(s) 
encoding the repeat unit of domain Z may be generated using DNA ligase to join non- 
palindromic nucleotide sequences, which may be produced by PCR techniques or 

20 chemical DNA synthesis, end to end in such a manner as to maintain the open reading 
frame of the product-encoding nucleotide sequence and which ensures that every third 
amino acid of the encoded Z domain is Gly. The use of complimentary, but non- 
palindromic, overhanging sequences at the ends of the designed non-palindromic 
nucleotide sequences ensures that they are joined in a consistent head to tail orientation. 

25 Further, the strategy allows the ready linking of nucleotide sequences encoding other 
polypeptide or peptide domains (e.g. the abovementioned A, B, C and/or D), by 
utilising terminal non-palindromic overhang sequences which result in the generation 
of a restriction enzyme site. This restriction enzyme site can then be used for the 
additional cloning of nucleotide sequences encoding the other polypeptide or peptide 

30 domains. 

This latter approach to the generation of the product-encoding nucleotide 
sequence(s) is preferred when the domain Z is to comprise a large number of repeat 
units (e.g. > 10 repeat units). 

The product-encoding nucleotide sequence(s) may include a sequence(s) 
35 encoding a secretion signal so that the polypeptide(s) or peptide(s) expressed from the 
product-encoding nucleotide sequence(s) are secreted. 
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The product-encoding nucleotide sequence(s) also comprise a nucleotide 
sequence preferably selected to match the codon use preferences of the selected yeast 
expression host, and is also constructed so as to minimise the potential for GGG to 
CCC interactions that may destabilize the structure. 
5 Expression of the product-encoding nucleotide sequence(s) may be driven by 

constitutive yeast promoter sequences (e.g ADH1 (Hitzeman et al t 1981; Pihlajaniemi 
et al., 1987), HIS3 (Mahadevan & Struhl,1990), 786 (no author given, 1996 
Innovations 5, 15) and PGK1 (Tuite et al t 1982), but more preferably, by inducible 
yeast promoter sequences such as GAL1-10 (Goff et al 1984), GAL7 (St. John & 
10 Davis, 1981), ADH2 (Thukral et al, 1991) and CUP1 (Macreadie et a/, 1989). 

The first and second nucleotide sequences encoding the P4H a and p subunits 
can be of any animal origin although they are preferably of avian or mammalian, 
particularly human, origin (Helaakoski et al., 1989). It is also envisaged that the first 
and second nucleotide sequences may originate from different species. In addition, the 
15 second nucleotide sequence encoding the P4H P subunit may include a sequence 

encoding an endoplasmic reticulum (ER) retention signal (e.g. HDEL (SEQ ID NO: 13), 
KDEL (SEQ ID NO:42) or KEEL (SEQ ID NO:43)) with or without other target 
signals so as to allow expression of the P4H in the ER, cytoplasm or a target organelle 
or, alternatively, so as to be secreted. 
20 Expression of the first and second nucleotide sequences may be driven by 

constitutive or inducible yeast promoter sequences such as those mentioned above. It is 
believed, however, that it is advantageous to achieve expression of the a and p subunits 
in a co-ordinated manner using same or different promoter sequences with same 
induction characteristics, but preferably by the use of a bidirectional promoter 
25 sequence. Accordingly, it is preferred that the first and second nucleotide sequences be 
expressed by the yeast GAL1-10 bidirectional promoter sequence, although other 
bidirectional promoter sequences would also be suitable. 

Multiple copies of the first, second and/or product-encoding nucleotide 
sequences may be introduced to the yeast host cell (e.g. present on a YAC vector or 
30 integrated into a host chromosone). It may be particularly advantageous to provide the 
product-encoding nucleotide sequence(s) in multicopy and, accordingly, it may be 
preferred to introduce the product-encoding nucleotide sequence(s) on a high copy 
number plasmid (e.g. a YEp plasmid). 

The introduced first, second and product-encoding nucleotide sequences may be 
35 borne on one or more stably retained and segregated DNA molecules. Where borne on 
more than one DNA molecule, the DNA molecules may be a combination of host 



chromosome(s) and/or CEN sequence-including vector(s) in combination with high 
copy number vector(s). Some specific examples of yeast host cells suitable for use in 
the method according to the invention, are transformed with the following DNA 
molecules: 



1. 


YEp-P3 + pYEUra3-ap, 


2. 


YEp-P3 + pYAC ap 


3. 


YEpCEN-P3 + pYEUra3-aP 


4. 


YEpCEN-P3 + pYACap 


5. 


pYAC-P3 + pYAC ap 



io 6. pYAC-P3 + pYEUra3-aP 
7. pYACap-P3; 

wherein P3 represents a product-encoding nucleotide sequence(s), a and p 
represent, respectively, nucleotide sequences encoding the P4H a subunit and P4H p 
subunit, CEN represents an introduced centromere sequence. The pYEUra3 and pYAC 

15 vectors include CEN sequences. 

Triple helical protein products produced in accordance with the method of the 
invention may be purified from the yeast host cell culture by techniques including 
standard chromatographic and precipitation techniques (Miller & Rhodes, 1982). For 
synthetic collagens, pepsin treatment and NaCl precipitation at acid and neutral pH may 

20 be used (Trelstad, 1982). Immunoaffinity chromatography can be used for constructs 
that contain appropriate recognition sequences, such as the Flag sequence which is 
recognised by an Ml or M2 monoclonal antibody, or a triple helical epitope, such as 
that recognised by the antibody 2G8/B1 (Glattauer et aL, 1997). 

Yeast host cells suitable for use in the method according to the invention may 

25 be selected from genus including, but not limited to, Saccharomyces, Kluveromyces, 

Schizosaccharomyces, Yarrowia and Pichia. Particularly preferred yeast host cells may 
be selected from S. cerevisiae, K, lactis, S. pombe, Y. lipolytica and P. pastoris. 

As indicated above, it is particularly preferred that the first, second and product- 
encoding nucleotide sequences be introduced to the yeast host cell by transformation 

30 with one or more YAC vectors. YAC vectors are linear DNA vectors which include 
yeast CEN sequences, at least one autonomous replication signal (e.g. ars) usually 
derived from yeast, and telomere ends (again, usually derived from yeast). They also 
generally include a yeast selectable marker such as URA3, TRP1, LEU2, or HIS3, and 
in some cases, an ochre suppressor (e.g. sup4-o) which allows for red/white selection in 

35 adenine requiring strains (i.e. the mutation of the adenine gene being due to a 

premature ochre stop codon). More commonly, two yeast selectable markers are 
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included, one on each arm of the artificial chromosome (each arm separated by the 
CEN). This allows selection of only those transformed hosts containing YACs with 
introduced sequences of interest within the desired restriction cloning site. That is, 
correct insertion of the sequences of interest (e.g. an expression cassette) rejoins the 
two arms of the restricted YAC, thus rendering trans formants prototrophic for both 
markers. YACs have been designed to allow for the introduction of large exogenous 
nucleotide sequences (i.e. of the order of lOOkb or more) into yeast host cells. The 
present inventors have hereinafter shown that such YACs may be used for the stable 
expression of multiple exogenous nucleotide sequences (e.g. nucleotide sequences 
encoding a natural collagen and both the a and P subunits of P4H). 

In some embodiments of the invention, it may be preferred that one or more (but 
not all) of the first, second and product-encoding nucleotide sequences be introduced to 
the yeast host cell by transformation with one or two YEp vectors. YEp vectors carry 
all or part of the yeast 2ja plasmid with at least the ori of replication. They also include 
a yeast selectable marker such as fflS3, LEU2, TRP1, URA3, CUP1 or G418 
resistance, and often also contain a separate ori, generally ColEl, and markers, such as 
ampicillin resistance, for manipulation in E.coli. They show high copy number, for 
example 20-400 per cell, and are generally efficiently segregated. Stability during cell 
division is dependent on the vector also containing the REP2/STB locus from the 2jj. 
plasmid. However, stability is not as good as endogenous 2ja plasmid of the host, 
particularly when heterologous genes are induced for expression. Stability also 
declines with increasing plasmid size. (Wiseman, 1991). 

The method of the present invention enables the production of triple helical 
protein products with two or more repeat units, which allows control of biological 
function and permits the possibility of enhancing the efficacy of selected domains by, 
for example, increasing binding sites and avidity for interacting agents and by 
activation through receptor clustering. 

The terms "comprise", "comprises" and "comprising" as used throughout the 
specification are intended to refer to the inclusion of a stated component or feature or 
group of components or features with or without the inclusion of a further component 
or feature or group of components or features. 

The invention will now be described by way of reference to the following non- 
limiting examples and accompanying figures. 
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Brief description of the accompanying figures: 

Figure 1 shows, diagrammatically, the construction of the expression vector 
pYEUra3.2.12S#39I#5 (labeled pYEUra3-MPcc). 

Figure 2 shows the nucleotide sequence for the COLIII1.6 kb DNA (SEQ ID 
NO:39). 

Figure 3 shows, diagrammatically, regions of the human collagen III gene that 
have been isolated by PCR. The 1.6kb DNA used in the examples hereinafter is also 
shown. It is to be understood that the other regions shown in the figure could substitute 
for the COLIII1.6kb DNA in those examples. 

Figure 4 shows, diagrammatically, the construction of the expression vector 
YEpFlagCOLHI1.6kb (labeled YEpFlag-C3). 

Figure 5 shows, diagrammatically, the construction of pYAC5 Pa. 

Figure 6 shows, diagrammatically, the construction of pYAC Pa-COL III1.6 kb. 

Figure 7 outlines the construction of synthetic collagen products. 

Figure 8 provides the nucleotide sequence (SEQ ID NO:40)for SYN-C3 
together with the amino acid sequence (SEQ ID NO:41) of the encoded polypeptide. 

Example 1: Construction of a veast vector for co-ordinated co-expression of the a 
and p subunits of ProIvl-4-hvdroxvlase. 

Production of yeast expression vector: 

pYEUra3 (Clontech) contains the bidirectional promoter for GAL1-10 
expression. Induction by galactose in the absence of glucose results in high level 
expression from pGALl of any protein encoded by DNA sequences inserted in the 
correct orientation in the MCS (multiple cloning site) [either Xhol, Sail, Xbal or 
BamHI sites] provided there is an initiating ATG start codon. For pGALlO, expression 
induced by galactose occurs if the DNA sequences to be expressed are inserted in 
frame with the ATG codon of GAL10 when said DNA sequences to be expressed is 
inserted in the EcoRI site. 

In order to utilise the EcoRI site for cloning, without the necessity that the insert 
be in frame with the ATG of GAL 10 for expression, it was necessary to modify 
pYEUra3 to remove the GAL 10 initiation codon. This was done as follows. A PCR 
fragment was generated using pYEUra3 as template and primers 3465 
rS'CTG.TAG. AGG.ATC.C CCGGG.TAC.GGA.GC^' (SEQ ID NO:l), where the 
BamHI site is underlined] and primer 1440 

r5TTA.TAT. TGA.ATT.C TC.AAA.AAT.TC-3' (SEQ ID NO:2), where the EcoRI 
restriction site is underlined]. Primer 1440 introduces an EcoRI site preceding the 



initiating ATG of GAL 10 in pYEUra3. The PCR fragment was restricted with BamHI 
and EcoRI and cloned into pYEUra3 similarly digested with BamHI and EcoRI, 
replacing the BamHI-EcoRI fragment containing an ATG start codon with a BamHI- 
EcoRI fragment lacking this ATG, to generate plasmid pYEUra3.2.12. The EcoRI site 
5 can then be used as a cloning site for which an initiating codon must be provided by the 
inserted DNA sequence as with the MCS at the other end of the promoter, thus placing 
it under control of the bidrectional pGALl-10 promoter and rendering expression 
inducible by galactose as are DNA sequences inserted in the MCS at the other end of 
the promoter. Cloning DNA sequences in the MCS and in the EcoRI site allows for co- 

10 ordinate expression by the bidirectional promoter when induced by galactose. 
Isolation of DNA molecules encoding the a and /3subunitsofP4H: 

The a subunit of P4H was PCR amplified from cDNA (Clontech Human 
Kidney Quick Clone™ cDNA Cat.#71 12-1) using primers 1826 [5 f -TGT.AAA. 
ATT.AAA.GGATCC.CAA.AG.ATG.TGG.TAT-3 1 (SEQ ID NO:3), where the BamHI 

15 site is underlined, ATG is the initiating codon for a subunit] and 1452 [5 f - 

GCCG.GGA ; TCC.TG.TCA.TTC.CAA.TGA.CAA.CGT-3 , (SEQ ID NO:4), wherein 
the BamHI site is underlined, TCA is the translation stop codon]. Two isoforms were 
obtained and cloned into the BamHI site of pBluescript II SK+ [Stratagene Cat.# 
212205] as storage vector to give pSK+a.l (form I) and pSK+a.2 (form II). There are 

20 no BamHI sites in the DNA encoding the a subunit. The signal sequence for secretion 
is present in the BamHI fragment of both forms. 

The p subunit of P4H [also known as PDI/protein disulfide isomerase] 
[Pihlajaniemi et al. y 1987] was PCR amplified from cDNA (Clontech Human Kidney 
Quick Clone™ cDNA Cat.#71 12-1) using primer pairs 2280 [5'- 

25 AC.TGG.ACG.GAT.CCC.GAG.CGC.CCC.GCC.TGC. 

TCC.GTG.TCC.GAC.ATG-3 1 (SEQ ID NO:5)] and 2261 [5' - 

G.GTT.CTC.CTT. GGT.GAC.C TC.CCC.TT-3' (SEQ ID NO:6), where the BstEII site 
is underlined] for the amino terminal part of the p subunit and primer pairs 2260 [5 f - 
G AA. GGG. GA G. GTC . AC C . AAG. GAG. AAC-3 ' (SEQ ID NO:7), where the BstEII 

30 site is underlined] and 1932 [5 f - 

CC.TTC.AGG.ATC.CTA.TTA.GAC.TTC.ATC.TTT.CAAC.AGC-3 , (SEQ ID NO: 8)] 
for the carboxy terminal part of the P subunit. The two PCR fragments for the p 
subunit were then ligated together following BstEII digestion, to produce a single 
fragment encoding the entire p subunit. This fragment was then amplified using the 

35 primers 2280 [S'-AC.TGG.ACOGATjCCC.GAG.CGC.CCC.GCC.TGC.TCC. 
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GTC . TCC . G AC . ATG-3 1 (SEQ ED NO:9), where the BamHI site is underlined, and 
ATG is the initiating codon for the p-subunit] and primer 1932 r5'-CC.TTC. AGG.ATC. 
CTA.TTA.GAC.TTC.ATC.TTT.CAC.AGC-3 , (SEQ ID NO: 10), where the BamHI site 
is underlined, and TTA is the translation stop codon for the P subunit] and then cloned 
5 into the BamHI site of pBluescript SKII+ to generate the storage vector pSK+p. 

Subsequently, the BamHI fragment of pSK+P was amplified by using primers 2698 [5 f - 
CTA.GTT.GAAJTC.TAC.ACA.ATG.CTG.CGC.CGC.GCT. CTG.CTG-3' (SEQ ID 
NO:l 1), where the EcoRI site is underlined, and ATG is the initiating codon of the P 
u subunit] and 2699 [5 ? -GCA ATG.GAAJTC.TTA.TTA. 

5 io CAG.TTC.GTG.CAC.AGC.TTT-3 1 (SEQ ID NO: 12), where the EcoRI site is 
j~ underlined, and TTA.TTA. provides two translation stop codons, and GTG. changes a 

yj lysine [K] residue to a histidine [H] residue to provide a native yeast ER retention 

® signal, HDEL (i.e. His.Asp.Glu.Leu (SEQ ID NO: 13)) rather than a mammalian 

p KDAEL (SEQ ID NO: 14) ER retention signal]. The resultant PCR fragment was then 

15 blunt end cloned into the Srfl site of pCRScript [Stratagene, Cat.# 21 1 190] to generate 
pCRScriptp. After retrieving the EcoRI fragment containing the p subunit from 
pCRScriptp by EcoRI digestion, the fragment was again cloned into the EcoRI site of 
pCRScript to generate pCRScriptpEcoRI#4. 

Construction of yeast expression vector including fragment encoding the a and fi 
20 subunit of P4H: 

The p subunit fragment was obtained as an EcoRI fragment from EcoRI 
digestion of pCRScriptpEcoRI#4. This EcoRI fragment was cloned into the EcoRI site 
of pYEUra3.2.12 to generate plasmid pYEUra3.2.12p#39. The a subunit fragment 
from pSK+a.l was re-excised from pSKot.l by BamHI and cloned into the BamHI site 
25 of pYEUra3.2.12p#39 to give pYEUra3.2.12p#39oc#5] (Figure 1). The p subunit 

fragment is under control of pGALlO and the a subunit fragment is under control of 
pGALl. This is a bidirectional promoter and allows co-ordinated induced expression 
of both subunits of prolyl-4-hydroxylase. Both fragments provide a native ATG 
initiating codon for translation. The encoded P subunit has its own signal secretion 
30 signal and a HDEL endoplasmic retention (ER) sequence at the carboxy terminus of the 
protein. While the encoded a subunit with its own signal sequence has no ER retention 
signal it should, nevertheless, be retained through its interaction with the P subunit. 
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Example 2: Co-ordinated co-expression of a collagen segment and prolvl-4- 
hvdroxvlase (a and B subunit) and synthesis of hvdroxvlated collagen Type III in 
yeast 

A 1 .6 kbp recombinant collagen fragment was generated by PCR using primers 
1989 [Forward primer 5 f -gct.agc.aag.ctt GGA.GCT.CCA. 

GGC.CCA.CTT.GGG. ATT.GCT.GGG-3 ' (SEQ ID NO:15)] and 1903 [Reverse primer 
5 '-tcg.cga.tct.aga.TTA.TAA.AAA.GCA.AAC.AGG.GCC.AAC.GTC.CAC. ACC-3 ' 
(SEQ ID NO: 16)] homologous to a region of the collagen type III alpha I chain 
(COL3A1). The template for isolation of the fragment of type HI collagen alpha 1 chain 
was prepared from Wizard purified DNA obtained from a cDNA library [HL1 123n 
Lambda Max 1 Clontech Lot#1245, Human Kidney cDNA 5'-Strectch Library]. 

The actual size of the isolated 1.6 kbp fragment is 1635 bp, comprising 161 1 bp 
of COL3A1 DNA flanked either side by 12bp derived from the primers. The 161 1 bp 
of COL3A1 DNA corresponds to nucleotides #2713-4826 (i.e codon #905-1442) of the 
full-length coding sequence, thereby spanning a portion of the I-helix region, all of the 
C-telo-peptide, all of the C-pro-peptide and stop codon.* 1 The nucleotide sequence for 
the COL3A1 DNA is provided at Figure 2. The region covered by the COL3A1 DNA 
is shown at Figure 3. The 1.6kbp fragment has a Nhel [GCTAGC (SEQ ID NO:17)] 
site and a Hindlll [AAGCTT (SEQ ID NO: 18)] site added at the 5'-end and a Xbal 
[TCTAGA (SEQ ID NO: 19)] site and a Nrul [TCGCGA (SEQ ED NO:20)] site added 
at the 3 5 end [where the 5* end is taken to be the forward direction of the reading frame, 
ie the amino terminal end of the derived coding sequence, and the 3' end is that derived 
from the reverse primer corresponding to the 3' end of the gene and carboxy end of the 
derived amino acid sequence]. This confers portability on the collagen fragment. 

The 1.6kbp fragment was cloned into the Smal site of YEpFlagl [IBI Catalogue 
#13400] so that the coding sequence is fused in frame with the vector expressed Flag 
protein. This allows for in frame expression of the introduced collagen gene fragment 
as a fusion protein when grown on ethanol. The blunt end cloning was performed by 
ligation of the Smal digested vector sequence [gel purified] and the 1.6kbp PCR 
fragment [gel purified, non-phosphorylated] at 20°C, in the presence of Smal, to 
prevent recircularisation of the vector alone and reduce the level of false positive 
transformants obtained. There are no Smal, Nhel, Hindlll, Xbal or Nrul sites in the 
fragment of collagen DNA used in the cloning. 

Small scale mini -preparations [prepared using Biol 01 columns and described 
methods for their use] of DNA from ampicillin resistant transformant colonies of E.coli 
were screened by restriction enzyme analysis. 10ml cultures rather than 1 ml cultures 
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were required to prepare an adequate level of DNA for analysis, as YEpFlag plasmids 
do not appear to be at a high copy number in E.coli. 

The fusion protein was of the form: yeast a factor signal sequence for direction 
to the ER and commitment to the yeast secretion pathway, yeast a factor propeptide 
5 with cleavage sites for kex 2-endopeptidase, resulting in removal of all cc-factor amino 
acid residues and generation of a free Flag-tagged amino terminal end, Flag peptide for 
detection and tagging of the fusion protein (8 amino acid residues), linker peptide (4 
amino acid residues), collagen helix (255 amino acid residues), collagen C-telopeptide 
[C-tel] (25 amino acid residues) and C-propeptide [C-pro] (255 amino acid residues) 
10 (for aid in formation of triple helix). The expected Flag-tagged protein consists of 547 
amino acid residues with a expected MW of ~60kDa]. 
yj Expression of the fusion protein in YEpFlagl is under the control of the ADH2 

promoter which is repressed by glucose but active in the presence of ethanol [a by- 
product of glucose metabolism]. There are multiple copies of the vector in individual 
15 yeast transformants due to the presence of the yeast 2 micron origin of replication in the 
vector, which leads to elevated expression of the 1.6 kbp PCR collagen fragment when 
fU glucose repression is lifted by consumption of glucose during growth. One unique 

Ljl 

q feature of this cloning scheme is that inserts of the 1 .6kbp collagen fragment in the 

wrong orientation will not form fusion products as the terminal leucine residue 

20 preceding the stop codon is coded by the codon AAT. In reverse orientation this 

generates a stop codon TAA. The result of incorrect insertion is the addition of only a 
single leucine coding codon [the stop codon TAA in reverse is AAT] following the 
Flag sequence before the protein is terminated. 

The amino acid sequence of the Flag-tagged fusion protein at the point of fusion 

25 is N-Asp-Tyr-Lys-Asp-Asp-Asp-Asp-Lys-[Flag]-Ala-Ser-Lys-Leu-[linker]-Gly-Ala- 
Pro-Gly-Pro-Leu-Gly-Ile-Ala-[a-helix] (SEQ ID NO:21). 

The YEpFlag collagen construct [hereinafter referred to as YEpFlag 
COLIII1.6kb; Figure 4] was introduced into a tryptophan prototrophic yeast strain such 
as for example BJ3505 [apep4::HIS3 prb-1.6RHIS3 lys2-208 trp 1-101 ura3-52 gal2 

30 canl], BJ5462 [a ura3-52 trpl leu2-l his3-200 pep4::HIS3 prb-1.6R canl GAL], 
(YGSG) JHRYl-5Doc [a his4-519 ura3-52 leu2-3 leu2-112 trpl pep4-3] or KRYD1 
[BJ3505xBJ5462 diploid] by transformation using electroporation, lithium acetate or 
spheroplast regeneration. Tryptophan auxotroph transformants were obtained, grown 
to high cell density in selective media [lacking tryptophan] followed by transfer to 

35 YPHSM, YEPM or YEPD or YEPGal, YEPE as described in the protocol provided 
with the YEpFlag expression system [IBI catalogue #13400]. At 3-9 days following 
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inoculation 1ml aliquot f s of culture were made and pellets and supernatants separated 
by centrifugation at 13000rpm in a benchtop centrifuge. Total yeast pellets were 
resuspended in lOOjal of gel loading buffer [5xSDS] containing PMSF [0.002M], 
vortexed vigorously for 2 minutes, and boiled for 5 minutes. From the pellets 900jil 
supernatants were retained to which lOOjxl 5xSDS/0.002M PMSF was added, and 
treated as described for the pellets. For both pellets and supernatants 20jal aliquofs 
were assayed by Western blot analysis of SDS-PAGE yeast total protein or of 
supernatants [media] following transfer to nitrocellulose and prehybridisation of the 
filters in blotto. Western blotting was carried out using a-Flag MAb Ml [against N- 
terminal free Flag] (International Biotechnologies Inc., (Eastman Kodak) Cat. No. 
EB13001) or M2 [against Flag] (International Biotechnologies Inc., (Eastman Kodak) 
Cat.No.IB13010). 

Western blots revealed the presence of a protein band of approximately 60kDa. 
This is the expected size of a protein fusion containing Flag-helix-C-tel-C-pro. After 
prolonged incubation the Flag responsive antibodies detected the appearance of the 
fusion product in the media. Detection in both pellet and media supernatant with Ml 
antibody demonstrates that the a factor leader has been completely removed. No 
precursor forms with a factor pro-region [glycosylated or not] were observed. 

No band corresponding to 60kDa was obtained which hybridised to Ml or M2 
with proteins obtained from untransformed yeast hosts. When yeast transformed with 
YEpFlag [no insert] alone was used, bands were obtained in pellets, but only with M2 
MAb. These bands correspond to un-secreted a-proregion-with C-terminal Flag and 
various glycosylated forms of the same. No Flag is detected in supernatants but this is 
to be expected as it is only 8 amino acids long. No expression from the ADH2 
promoter for any construct is observed in the presence of glucose. 

YEpFlagCOLIII 1.6kb was also co-introduced [co-transformed] into yeast 
strains such as BJ5462 and KRDY1 which are capable of growth on galactose along 
with pYEUra3 [Clontech ][pYEUra3 and its derivatives contain the bidirectional 
GAL1-10 promoter. Both the ADH2 and GAL1-10 promoters are repressed by glucose. 
The GAL1-10 promoter is induced by galactosel] or pYEUra3.2.12 [a modification of 
the Clontech parent vector which allows cloning of genes into an EcoRI site without 
the necessity of the introduced gene being in the correct reading frame] or 
pYEUra3.2.12p#39 [in which the DNA encoding the P subunit (equivalent to protein 
disulfide isomerase of prolyl-4-hydroxylase is cloned into the EcoRI site of 
pYEUra3.2.12 under control of GAL10 promoter] or pYEUra3.2.12p#39a#5 [in which 
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the DNA encoding the a subunit of P4H is cloned into the BamHI site of 
pYEUra3.2.12p#39 under control of the GAL1 promoter]. 

Transformants were selected on media lacking tryptophan or uracil or lacking 
both tryptophan and uracil. As previously done with tryptophan transformants obtained 
above with YEpFlag or YEpFlagCOLIII1.6kb, transformants were grown in selective 
media prior to growth in YPHSM, YEPM, YEPD ,YEPG or YEPE and after 4 days 
galactose was added to a final concentration of 2%, 0.5% or 0.2%. Total yeast protein 
or supernatants were analysed by Western blot analysis as described above except that 
a third MAb [5B5 against the subunit] (Dako Corporation, Cat. No. M877) was also 
used. 

Western blot analysis revealed the presence of a ~60kDa band in trp" or trp' ura" 
yeast transformed with YEpFlag COLIII1 .6kb but not YEpFlag alone when screened 
with MAb Ml or M2 as was previously the case with transformants obtained with 
single plasmid transformation. 

Analysis also showed the presence of a ~60kDa band in ura" or ura" tip' but not 
trp" yeast transformants transformed with pYEUra3.2.12p#39 or 
pYEUra3.2.12p#39a#5 or cotransformed with same plus YEpFlag or YEpFlagCOLIII 
1.6kb when screened with anti-& subunit MAb 5B5 but only following induction with 
galactose and only when galactose was between 0.2 and 0.5% and not at 2%. The 
expected size for the P subunit is also 60kDa. This band is not detected by Ml or M2 in 
uracil auxotrophic yeast transformed with pYEUra3.2.12p#39 or 
pYEUra3.2.12p#39oc#5 alone. 

At the time of the experimentation, an antibody for the detection of expression 
of the a subunit from the bidirectional GAL1,10 promoter in pYEUra3.2.12p#39a#5 
was not available but as the promoters for both GAL1 and GAL10 are normally co- 
induced and under the control of the same UAS (upstream activation sequence) in yeast 
it was assumed that the a subunit is also transcribed and expressed where the P subunit 
is demonstrated to be expressed. To test this, the capacity for pYEURa3.2.12p#39a#5 
/ YEpFlag COLIII 1.6kb co-transformants induced with 0.2% galactose following at 
least 4 days growth on YPHSM to produce functional P4H was examined. Galactose 
was added following the clear demonstration of the expression of Flag-collagen by a 
positive response of yeast protein to Ml or M2 in Western blots and the absence of a 
response to MAb 5B5 against p subunit. Following induction with galactose [16hrs] 
protein was again examined and the presence of Ml or M2 responsive bands and 5B5 
responsive bands were separately demonstrated. Protein was transferred to PVDF 
membrane following SDS-PAGE and the membrane sliced into strips. Membrane 



strips containing protein from the region corresponding to the 60kDa responsive area 
was subject to hydrolysis and amino acid analysis. Amino acid analysis revealed the 
presence of hydroxyproline in this material from co-transformants of yeast co- 
transformed with YEpFlagCOLIIIl .6kb and pYEUra3.2.12p#39cc#5 after induction 
5 with 0.2% galactose but no hydroxyproline was detected with protein from control 
samples with or without galactose. 

The media used contains peptone derived from bovine protein hydrolysates but 
no hydroxyproline was found in total yeast grown on this media nor in any of the singly 
transformed yeast [one vector alone]. Only in yeast co-transformants was 

10 hydroxyproline detected in the 60kDa bands and then only when galactose was added. 
Uninduced co-transformants [no galactose] in which Flag detected collagen was 
expressed did not contain any hydroxyproline in the 60kDa band excised from PVDF 
following transfer. Hydroxyproline was only found in the 60kDa region and not in 
other regions of the blot. 

15 The clear evidence then, is that following galactose induction of 

pEUra3.2.12p#39a#5 a product is produced in yeast which is capable of hydroxylating 
the proline residues of a co-expressed Flag-tagged collagen fragment. Such activity is 
not found in yeast untransformed or transformed with pYEUra3.2.12p#39 [no a 
subunit] or in uninduced yeast grown on ethanol or glucose. 

20 A clear advantage of this method of co-expression for the production of 

hydroxylated collagen in yeast is the co-ordinated expression of the three genes that is 
possible in co-transformants. Another advantage is that the a and p subunits 
themselves are co-ordinately expressed. A third advantage is that the ap expression 
vector (i.e. pEUra3.2.12p#39a#5) contains a centromere sequence and behaves as a 

25 mini-chromosome. It is therefore very stable and does not require selection pressure to 
be maintained for its stability. The removal of selection pressure in yeast does not 
appear to effect the stability of the YEpFlag collagen construct as it is in very high 
copy number, but clearly the ability to only be concerned with maintenance of a single 
plasmid in the absence of selection pressure is important rather than balancing the 

30 effects of selection pressure on the stability of three separate plasmids if the a, P and 
collagen fragments were separately cloned on multicopy vectors. Also the use of a 
bidirectional promoter to express the a and P subunits simultaneously is of benefit 
rather than expressing them from different promoters on different plasmids in different 
amounts. The a subunit probably requires the synthesis of equal or higher levels of the 

35 p subunit for its correct assembly into functional P4H ((X2P2) enzyme and co-ordinated 
expression appears to be an efficient mechanism to ensure this. 
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* l [ Codon numbering for collagen type III alpha 1 chain: ATG, codon #1; codon #l-codon #24, signal 
sequence; codon #25-codon #1 16, N-pro-peptide sequence, codon #1 17-codon #130, N-telo-peptide 
sequence; codon #131-codon #1 161, a-helix sequence; codon #1 162-codon #1 186, C-telo-peptide; 
codon #1 187-codon #1441, C-pro-peptide; codon #1442, stop] and [corresponding nucleotide numbering 
for collagen type III alpha 1 chain: nucleotide #1-72, signal sequence; nucleotide #73-348, N-pro-peptide 
sequence; nucleotide #349-390, N-telo-peptide; nucleotide #391-nt#3983, a-helix region; nucleotide 
#3984-4058, C-telo-peptide; nucleotide #4059-4823, C-pro-peptide sequence; nucleotide #4824-4826, 
stop codon]. 

Example 3; Use of Yeast Artificial Chromosomes [YACsl for co-ordinated 
expression of the a and p su bun its of Prolyl-4-hydroxvlase [P4H1, 

pYAC5 [1 1454bp] (Kuhn and Ludwig, 1994) was digested with BamHI to 
liberate the HIS3 gene [1210bp] from between the 2 telomere ends and with Sall-Nrul 
to produce two fragments [left arm: fragment 1, 5448bp & right arm: fragment2, 
423 8bp] which were gel purified. Fragment 1 was BamHI-telomere end-is. co/z ori-p- 
lactamase gene [ampicillin-resistance] -TRP 1 -ARS 1 -CEN4-tRNAsup-o-SalL 
Fragment 2 was BamHI-telomere end-URA3 -NruL 

pYEUra3.2.12p#39a#5 was digested with Sall-EcoRV to produce a P4H 
expression cassette fragment of the form Sall-Xbal-BamHI-a-ATG-BamHI-pGALl- 
1 0-EcoRI- ATG-p-EcoRI-Smal-EcoRV [4864bp] which was gel purified. The 
expression cassette fragment encoding the a and P subunits of P4H under the control of 
a galactose inducible bidirectional promoter was ligated with fragments 1 and 2 of the 
BamHI-Sall-NruI digested pYAC5 and the ligation mix used to transform the following 
yeast strains: BJ2407 [a/a prbl-1 1222/prbl-l 122 pre 1-407/prc 1-407 pep4-3/pep4-3 
Ieu2/leu2 trpl/trpl ura3-52/ura3-52 ], KRYD1 [a/a ura3-52/ura3-52 trpl-AlOl/trpl 
lys2-208/LYS2 fflS3/his3A200 gal2/GAL2 canl/canl pep4::HIS3/pep4::HIS3 
prblA1.6R/prbA1.6R ], GY1 [ a leu2 adel trpl ura3 ], JHRYl-5Da [ a his4-519 ura3- 
52 leu2-3 leu2-112 trpl pep4-3 ], and YPH150[ a/a ura3-52/ura3-52 Iys2-801a/lys2- 
801aadel-101o/adel-101o leu2Al/leu2Al trpl-A63/trpl-A63 his3A200/his3A200 ] 
using the method for lithium acetate transformation. Yeast strains were also 
transformed with pYAC5 digested with BamHI and undigested pYAC5. 

Ura + Trp + co-transformants were obtained for all strains where the two 
fragments of pYAC5 each carrying either TRP 1 [Sall-CEN4-TRP1 -BamHI] [fragment 
1] or URA3 [NruI-URA3 -BamHI] [fragment 2] as the selectable marker for 
transformation each on one arm of the YAC, had been linked together by the insertion 



of the P4H expression cassette into the Sall-EcoRV sites. This vector was designated 
pYAC5Poc (Figure 5). The vector was of the form BamHI-telomere-URA3- 
NruI/EcoRV [both sites destroyed]-P-ATG-pGAL10-l-ATG-a-SalI-tRNAsup-CEN4- 
ARS 1 -TRP 1 - AMPr-ori-telomere-B amHI. The presence of the CEN4 sequence means 
5 the vector behaves as a stable chromosome during replication and is segregated at least 
1 copy per cell at mitosis and meiosis [as was the case for pYEUra3.2.12p#39cc#5]. 
The telomere ends mean that the vector is linear and stable. 

Transformants and controls [pYAC5 alone (circular), pYAC5 linearised by 
BamHI digestion] were replica plated onto nitrocellulose filters laid over selective 

10 media [SD Complete lacking uracil and tryptophan] or rich media [YEpD] and 

incubated 2-5 days at 30C till confluent. Filters were transferred to selective media 
containing galactose [2%] instead of glucose or rich media containing galactose [2%] 
as well as glucose media plates and grown at 30C for periods between 2h-72h. At the 
end of incubation colonies were lysed on 0.1%SDS-0.2N NaOH-0.1% P- 

15 mercaptoethanol, washed with water and filters blocked with Blotto. Production of the 
a and p subunits of P4H was ascertained by hybridising the treated filters with MAbs 
specific for the a [MAb 9-47H10] (ICN Biomedical Inc. Cat. No. 631633) and p [MAb 
5B5] subunits. Colonies transformed with pYAC5pot and induced with galactose 
showed hybridisation with MAbs against the subunits of P4H demonstrating co- 

20 ordinated production of a and p from the bi-directional GAL 1-10 promoter. Controls 
filters and control yeast did not produce a response to P4H MAbs. Yeast transformants 
carrying pYAC5pct grown on glucose [a repressor of the bi-directional GAL 1-10 
promoter] also did not produce a positive response. 

Positive transformants identified in the above screening procedure were 

25 precultured/grown in 10ml liquid culture media containing selective media lacking ura 
and trp or rich media [containing glucose, glycerol or raffinose]. Aliquots were 
transferred to inducing media [selective or rich] containing 0.2-2% galactose. Where 
glucose was the carbon source pellets were washed in sterile water prior to induction. 
After 2-20h further growth at 30C cell pellets were collected, suspended in loading 

30 buffer and total yeast protein separated on SDS-PAGE and western blotted. Filters 

were blocked with blotto and hybridised with MAbs against both of the P4H subunits. 
Only those yeast transformants carrying pYACSpa and induced with galactose gave 
the expected 60kDa bands for a and p subunits. This demonstrates that the P4H 
expression cassette has been functionally inserted into pYAC5. The advantage of 

35 having the P4H cassette in the pYAC is twofold; [1] as with the case of 

pYEUra3.2.12p#39a#5 the presence of the CEN sequence means that the vector is 
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stably maintained in this system when selection pressure is removed for growth in rich 
media, which increases yield through increased cell density, and [2] the pY ACS Pot 
construct allows for the subsequent insertion of multiple and different triple helical 
protein expression cassettes. 

Example 4: Co-expression of collagen/triple helical protein fragment(s) expressed 
on a multicopy plasmid and P4H subunits in veast transformants carrying 
pYACSBa. 

Yeast host strains containing pYACSPa or pYACS were transformed with 
YEpFlagColIII 1.6kb or YEpFlag alone. The form of the collagen bearing vector was 
circular and multicopy. In this instance, as the YEpFlagCOLIII 1 .6kb and the pYAC 
constructs both contain the same selectable marker, yeast transformants producing Flag 
tagged-collagen were identified by colony hybridsation with MAbs against Flag [Ml or 
M2]. Colonies were also screened for whether they carried extra copies of bla gene 
[multicopy] by identifying those colonies producing increased levels of P-lactamase by 
PAD AC assay (Macreadie et al., 1994). In other examples, the multicopy plasmid 
could utilise a different selectable marker other than URA3 or TRP1 found on each arm 
of the YAC. Various co-transformant types carrying pYAC5pa and YEpFlag COLIII 
1.6kb were assayed as in Example 1 for collagen production, P4H subunit production, 
and P4H activity. Those co-transformants containing pYAC5pa plus YEpFlag COLIII 
1.6kb were then screened as described in the previous example for hydroxylated 
collagen to identify 60kDa bands in western blots responding to MAbs against the a 
and P and Flag following induction. The a and p subunits were only identified 
following galactose induction. Hydroxylated protein was only identifed following 
induction of both the a and P subunits of P4H. 

Example 5: Introduction of collagen expression cassette into pYAC5 and 
pYAC5Boc. 

YEpFlag was linearised by digestion with Seal which cuts at a single 
recognition site in the ampicillin resistance gene for P-lactamase [bla]. There are no 
Seal sites in the 1.6kb collagen fragment insert so Seal could also be used to linearise 
YEpFlagColIII 1.6kb. Linear DNA was used to transform yeast containing pYACS or 
pYACSPa. Yeast transformants producing Flag tagged-collagen were identified by 
colony hybridsation with MAbs against Flag [Ml or M2]. Colonies carrying extra 
copies of bla gene [multicopy] were also identified. Those colonies producing 
increased levels of P-lactamase by the PEDAC assay were found to have inserted a 
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copy of YEpFlag COLIII 1 ,6kb into the pYAC5 or pYAC5pcc vector of the host strain 
and correspond to those colonies positive to MAbs Ml or M2. The increased 0- 
lactamase activity is a result of gene amplification resulting from homologous 
recombination between the linearised bla gene on YEpFlagCOLIII 1.6kb and the bla 
5 gene on pYAC. The new plasmids formed by insertion into pYAC5 or pYAC5pcc of 
the YEpFlag COLIII 1.6kb vector were designated pYAC-COLIII 1.6kb and pYAC 
aP-COLIII 1.6kb (Figure 6). Expression experiments were performed and only those 
strains carrying all 3 genes on the YAC [pYAC pa -COLIII 1.6kb] and induced for 
P4H with galactose produced hydroxylated collagen. 

10 

Example 6: Cloning and expression of a synthetic collagen protein. 

A strategy is described for the generation of "synthetic/novel" collagen proteins 
involving the in vitro assembly of synthetic oligonucleotides repeat sequences encoding 
the peptides GPP.GPP.GLA (SEQ ID NO:22), GPP.GPP.GER (SEQ ID NO:23), 

15 GPP.GPP.GPA (SEQ ID NO:24) or GPP.GPP.GAP (SEQ ID NO:25). The synthetic 
collagen sequences are engineered to contain a high percentage of proline residues as 
this residue has been shown to confer thermal stability to collagen molecules. The 
residue pairs chosen in the above peptides for the XY position (i.e. LA, ER, PA or AP), 
are selected since they appear in statistically higher amounts in fibrillar collagens. 

20 Mixtures of synthetic oligonucleotides encoding SEQ ID NOs:22, 23, 24 or 25 

may be joined together to generate DNA fragments of discrete lengths, encoding 
synthetic collagen proteins of discrete molecular size and with different physical 
characteristics. These synthetic gene segments can be cloned into various expression 
vectors for subsequent production of a collagen product in yeast. An outline of the 

25 strategy for construction of a synthetic oligonucleotide encoding a collagen is shown in 
Figure 7 where XY is shown, for the purposes of exemplification only, as ER, LA, AP, 
PA. 

Such synthetic oligonucleotides have been synthesised and several libraries 
containing gene segments of various lengths have been generated by ligating these 
30 oligonucleotides together (maximum visible DNA length approx. 1000 base pairs 
coding for a polypeptide of - 350 amino acid residues). 

Example 7: Construction of a synthetic hvdroxvlated triple helical protein for 
stable expression in veast. 
35 A region of Type III collagen was selected for its known capacity to bind and 

activate platelets [through an integrin binding site near -Gly-Leu-Ala-Gly-Ala-Pro-Gly- 
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Leu-Arg (SEQ ID NO:26)]. A region of 5 GLY-X-Y repeats to the N-terminal side and 
7 GLY-X-Y repeats to the C-terminal side were also included to form the basic repeat 
unit for inclusion in the synthetic fragment. The sequence of the repeat was 
GGKGDAGAPGERGPP-GLAGAPGLR-GGAGPPGPEGGKGAAGPPGPP (SEQ ID 
5 NO:27). This corresponds to residues 637-681 (nucleotides 1909-2043) in the 

COL3A1 gene [with Met =1]. At the 5'-end of the DNA an EcoRI site and Nhel site 
was included such that the Nhel site provided an initiating methionine. Thus the 
sequence at the amino end is MGAPGAP (SEQ ID NO:28), where GAPGAP (SEQ ID 
NO:29)is the natural sequence flanking the repeat in COL3A1. The repeat was linked 

10 to a second repeat by a linker which introduced a Bspl20I site for later manipulations 
and provided the sequence GGP between the first and second repeat unit. The second 
repeat was linked to a third repeat by a linker which introduced a BssHII site [again for 
later manipulation] and resulted in the amino acid sequence GAR. The third repeat was 
flanked by 2 additional GPP triplets, a GCC triplet and finally GLEGPRG (SEQ ID 

15 NO:30). This was a result of including coding sequence that provided for Xhol, SacII 
and Nhel sites. These were included for flexibility of cloning at later stages. The Nhel 
site provides an in frame stop codon. 

The synthetic fragment was produced by PCR from primers against COL3A1 in 
3 pieces initially. Fragment 1 was EcoRI-NheI-Met-[GAP]2-[REPEAT ]1-Bspl201. 

20 The primers for this were S'-aattccatg-ggtgctccaggtgctcc-S' [up] (SEQ ED 
NO:31)[primer U101] and 5'-ggcc-acctggtggacctggtgg-3' [down] (SEQ ID 
NO:32)[primer D101]. The second PCR fragment used primers 5'-ggccc- 
ggtggtaagggtgacgc-3* [up] (SEQ ED NO:33)[primer U102] and 5'-cgcgc- 
acctggtggacctgg-3' [down] (SEQ ID NO:34) [primer D102]. For the 3rd repeat primer 

25 pairs used were 5'-cgcgc-ggtggtaagggtgacgctgg-3' [up] (SEQ ED NO:35)[primer U103] 
and 5'-acaaccctggtggacctggtggacc-tggtggacctgggtgg-3' [down] (SEQ ID 
NO:36)[primer D103]. The three fragments form the PCR reactions were gel purified 
and ligated together. The DNA from the ligation mixture was then used as the template 
for a further round of PCR using primer U101 and a new primer at the 3' end [5'- 

30 ctagccccgcggaccctcgagaccaca-acaaccctggtgg-3' ] [down] (SEQ ID NO:37)[primer 

D104]. A band of approximately 500 bp was produced and gel purified, digested with 
EcoRI-Nhel and ligated to pYX141 (Ingenous Cat. No MBV-025-10) [LEU2-CEN- 
p786] also digested with EcoRI-Nhel before being transformed into E.coli. 
Transformants were screened by PCR using primers for the second fragment and DNA 

35 from positive colonies were miniprepped and screened by enzyme digestion with 

EcoRI-Nhel for the presence of an insert of approximate 500 bp. This storage vector 
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was designated pYX-SYN-C3-l. The EcoRI-Nhel fragment was transferred to 
pYX243 [2u-LEU2-pGAL] (Ingenous Cat. No MBV-035-10) to give pYX-SYN-C3-2 
and this plasmid was introduced into a yeast host cell including neucleotide sequence 
for the carrying the P4H a and P subunits [either pYEUra3.2.12p#39ot#5 or 
pYACaP]. Expression following galactose induction was determined by using a MAb 
2G8/B1 (Werkmeister & Ramshaw, 1991) which recognises the sequence 
GLAGAPGLR (SEQ ID NO:38). An EcoRI-SacII fragment from pYX-SYN-C3-2 was 
also introduced into the EcoRI-SacII of YEpFlag to produce YEpFlag-SYN-C3 and this 
too was introduced into a yeast host cell expressing P4H on induction by galactose. A 
product of approximately 18 kDa [the expected size of SYN-C3] was detected in yeast 
induced with galactose by Western blotting. 

The nucleotide sequence for SYN-C3 is provided at Figure 8 together with the 
amino acid sequence of the encoded product. 

Example 8: The use of yeast other than Saccharomyces cerevisiae. 

The GAL1-10 promoter is functional in Kluyveromyces whilst the.ADH2 
promoter is constitutively expressed in S. pombe. By shifting the expression cassettes 
to appropriate vectors, other yeast hosts can be used. K. lactis for instance has been 
shown in some instances to display less proteolytic activity for recombinant products. 
Alternatively, P. pastoris could be used for multiple integration of the expression 
cassette for a p into the chromosome. 

For expression in P. pastoris, the nucleotide sequence described in the previous 
example encoding the synthetic triple helical protein [SYN-C3] was inserted into the P. 
pastoris vector pPIC9 (Invitrogen, Cat. No. K1710-01) at the EcoRI-NotI sites [pPIC- 
SYN-C3], Following digestion with either Bglll or Sail, the plasmid was introduced 
into P.pastoris where it was integrated at either the AOX1 or HIS4 sites for Bglll or 
Sail respectively. The nucleotide sequences encoding the P4H a and p subunits were 
also introduced into P. pastoris using the EcoRI site of pHIL-D2 (Invitrogen, Cat. No. 
K1710-01) for the P subunit and integration at HIS4 and the BamHI site of pHIL-Sl 
(Invitrogen, Cat. No. K1710-01) for the a subunit and subsequent integration HIS4. 
All three expression cassettes were under the control of the AOX1 promoter and 
induced by methanol. 
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Example 9: Enhanced expression of prolv-4-hvdroxvlase a and B subunits from 
the GAL1-10 promoter bv use of veast with different backgrounds for control of 
galactose induced expression. 

The plasmid pYEUra3.2.12p#39oc#5 [encoding the a and (3 subunits of P4H 
5 under the control of the GAL1-10 bidirectional promoter] can be introduced into a 
yeast host cell with the following genotype : a or a, ura3 trpl egdl bttL In these cells, 
the absence of the products for the EGD1 and BTT1 genes results in higher levels of 
galactose induced expression from GAL4 dependent promoters such as GAL2, GAL4, 
GAL7, GAL1-10, MEL1 (Hu & Ronne, 1994). 
10 Another mechanism for enhanced expression is the use of a yeast host cell 

carrying multiple copies of the GAL4 (Johnston & Hopper, 1982) positive 
transcriptional activator under its own controlled induction by galactose. This leads to 



03 enhanced expression as there is no limit to the availability of the transcriptional 

activator for the GAL1-10 promoter. Similarly, the yeast host cell could contain 
5 15 multiple copies of the SGE1 gene (Amakasu et aL, 1993) which also leads to enhanced 
transcription from galactose induced promoters. 

Various combinations of these backgrounds could also be utilised; that is egdl 
bttl SGEl™ or egdl bttl GAL4 mc or egdl bttl SGEl mc GAL4 mc [where mc represents 
jj~ multiple copies]. 

20 

Example 10: Expression of collagen from promoters other than ADH2. 

The collagen encoding nucleotide sequence in YEpFlag COL 1.6kb can be 
excised as a Nhel or Hindlll- Xbal or Nrul fragment for insertion into other fusion 
vectors under the control of other promoters. Alternatively, the pADH2-cc signal-A- 

25 proregion-Flag collagen cassette can be excised as a Nael or SacI - Bglll or Xbal or 
Spel or SnaBI or NotI, for example, and introduced into an appropriate vector such as 
YEplacl81 (Gietz & Sugino, 1988) or pMH158 (Heuterspreute et aL, 1985) for 
expression in different copy numbers and host backgrounds or into vectors with GEN 
sequences. Alternatively, CEN sequences can be introduced into the YEpFlag vector 

30 itself. The cassette can also be removed without the ADH2 promoter using Nrul and 
introduced into an appropriate vector behind an appropriate promoter. 

Collagen encoding nucleotide sequences can be expressed using the CUP1 
promoter in vectors such as pYELC5 (Macreadie et aL, 1989) as an alternative to the 
ADH2 promoter. This promoter is induced by addition of copper (i.e. copper sulfate) 

35 and may have the advantage of an increased reducing environment and enhancement of 
P4H activity during co-expression. A second promoter that can be used is the TIP1 
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promoter which is induced by cold shock. Here the stability of the expressed collagen 
may be enhanced without the need for hydroxylation by inducing expression by 
shifting growing yeast from 30°C to 18°C. 

Example 11: 

Two single strand synthetic oligonucleotide sequences encoding complimentary 
forward and reverse strands with complimentary, non-palindromic 5 f overhanging 
sequences were annealed to produce a double-stranded oligonucleotide molecule with 
the following nucleotide sequence: 

5 ' - AG A TCC GGT GGT AAG GGT GAC GCT GGT GCT CCA GGT GAA AGA 
CCA CCA TTC CCA CTG CGA CCA CGA GGT CCA CTT TCT 



GGT CCA CCA GGT TTG GCT GGT GCT CCA GGT TTG AGA GGT GGT GCT 
15 CCA GGT GGT CCA AAC CGA CCA CGA GGT CCA AAC TCT CCA CCA CGA 
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s ni GGT CCA CCA GGT CCA GAA GGT GGT AAG GGT GCT GCT GGT CCA CCA 

fls CCA GGT GGT CCA GGT CTT CCA CCA TTC CCA CGA CGA CCA GGT GGT 



20 GGT CCA CCA GGT (SEQ ID NO: 44) 

CCA GGT GGT CCA TCT AGG -5' (SEQ ID NO: 45) 

which gives rise to the repeat unit sequence; 

25 RSGGKGDAGAPGERGPPGLAGAPGLRGGAGPPGPEGGKGAAGPPGPPG (SEQ ID NO: 46). 

Following annealing, the oilgonucleotides were then subjected to ligation in the 
presence of T4 DNA ligase and fractionated by size. Either of the following 
procedures were followed to clone the various repeat units into YepFlagl which had 
30 been pre-digested with Bglll and BamHL 

In the first approach, the following two linkers were added; 

5 ' -GATCCGGT (SEQ ID NO:47) 

gccatctagg - 5 ' ( seq id no : 4 8 ) to anneal at the forward end 

35 and 

5 ' - AGATCCGGTA (SEQ ID NO : 4 9 ) 
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gccatctag-5 ' (seq id no : 50) to anneal at reverse strand end. 

Following phosphorylation, the insert, with the linkers attached, was cloned into 
YepFlagl pre-digested with Bglll and BamHL This gives rise to a construct encoding 
the following amino acid sequence; 

Flag Linker 

D YKDDDDDKEFLEPGRS ( GRS GGKGDAGAPGERGP PGLAGAPGLRGGAG P PG PEGGKG 
Linker 

AAGPPGPP) n GRSGPVDPK (SEQ ID NO: 51). 

In the second approach, the ends of the oligonucleotide sequences were blunt 
endedand then blunt end cloned into the Smal site of YepFlagl to produce the 
following amino acid sequence; 

Flag Linker Linker 

DYKDDDDDKEFLEPRS (GRSGGKGDAGAPGERGPPGLAGAPGLRGGAGPPGPEGGKGAAGPPGPP) -GRS 
IDGSGPVDPR (SEQ ID NO: 52) 

Either of these ligation mixtures were used to transform E. coli cells and 
plasmids containing multiple repeats of the original oligonucleotide sequences were 
selected. Those having the correct orientation were confirmed and the individual 
plasmids containing variable numbers of the repeat unit were introduced into yeast: 
Expression was as described in Example 7. 
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The method according to the invention provides for the stable expression of 
triple helical proteins from yeast host cells. Synthetic products may show enhanced or 
novel functions (e.g. inclusion of RGD and/or YIGSR sequences from fibronectin and 
laminin, and a fusion of a collagen sequence to a platelet derived growth factor (PDGF) 
will provide a protein product useful in wound healing and fibrosis). The products may 
be used in a wide range of applications including bioimplant production, soft and hard 
tissue augmentation, wound/burn dressings, sphincter augmentation for urinary 
incontinence and gastric reflux, periodontal disease, vascular grafts, drug delivery 
systems, cell delivery systems for natural factors and as conduits in nerve regeneration. 
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It will be appreciated by persons skilled in the art that numerous variations 
and/or modifications may be made to the invention as shown in the specific 
embodiments without departing from the spirit or scope of the invention as broadly 
described. The present embodiments are, therefore, to be considered in all respects as 
illustrative and not restrictive. 



